1632. Rank Transform of a Matrix
Hard

Given an m x n matrix, return a new matrix answer where answer[row][col] is the rank of matrix[row][col].

The rank is an integer that represents how large an element is compared to other elements. It is calculated using the following rules:

  • The rank is an integer starting from 1.
  • If two elements p and q are in the same row or column, then:
    • If p < q then rank(p) < rank(q)
    • If p == q then rank(p) == rank(q)
    • If p > q then rank(p) > rank(q)
  • The rank should be as small as possible.

It is guaranteed that answer is unique under the given rules.

 

Example 1:

Input: matrix = [[1,2],[3,4]]
Output: [[1,2],[2,3]]
Explanation:
The rank of matrix[0][0] is 1 because it is the smallest integer in its row and column.
The rank of matrix[0][1] is 2 because matrix[0][1] > matrix[0][0] and matrix[0][0] is rank 1.
The rank of matrix[1][0] is 2 because matrix[1][0] > matrix[0][0] and matrix[0][0] is rank 1.
The rank of matrix[1][1] is 3 because matrix[1][1] > matrix[0][1], matrix[1][1] > matrix[1][0], and both matrix[0][1] and matrix[1][0] are rank 2.

Example 2:

Input: matrix = [[7,7],[7,7]]
Output: [[1,1],[1,1]]

Example 3:

Input: matrix = [[20,-21,14],[-19,4,19],[22,-47,24],[-19,4,19]]
Output: [[4,2,3],[1,3,4],[5,1,6],[1,3,4]]

Example 4:

Input: matrix = [[7,3,6],[1,4,5],[9,8,2]]
Output: [[5,1,4],[1,2,3],[6,3,1]]

 

Constraints:

  • m == matrix.length
  • n == matrix[i].length
  • 1 <= m, n <= 500
  • -109 <= matrix[row][col] <= 109
Accepted
3,836
Submissions
11,960
Seen this question in a real interview before?
Companies
Related Topics
Similar Questions
Show Hint 1
Sort the cells by value and process them in increasing order.
Show Hint 2
The rank of a cell is the maximum rank in its row and column plus one.
Show Hint 3
Handle the equal cells by treating them as components using a union-find data structure.

Average Rating: 4.75 (16 votes)

Premium

Solution


Overview

This problem is an extension of the original problem, Rank Transform of an Array. However, the original method in the original problem does not work. To tackle this, we need to add some similar methods used in Most Stones Removed with Same Row or Column. Moreover, to avoid Time Limit Exceeded, some optimization tricks should be applied.

It's indeed a hard problem, so please don't be frustrated if you can not solve it.

Below, we will discuss three approaches: Sorting + BFS, Sorting + DFS, and Sorting + Union-Find.

It's recommended to start reading from approach 1. Also, it's a long article, so take your time to read it.


Approach 1: Sorting + BFS

Intuition

Let's recall the method used in the original Rank Transform of an Array. The idea is simple: sort the values in the array, and arrange the ranks from the lowest value to the highest value.

It's natural to consider applying the same thing to our matrix: sort the values in the matrix, and arrange the ranks from the lowest one to the highest one.

However, this method does not work. In this problem, we are only required to rank values according to row and column, and not to the whole matrix. The condition is looser.

If we arrange the ranks according to the whole matrix, the resulting rank will be huger than what we want.

For example, consider this case:

Figure 1

In this case, if we just rank [2, 3, 4, 5] by their values as [1, 2, 3, 4], we will get a larger rank for some elements.

We need to make some modifications to get our solution to work.

The idea of sorting and ranking from small value to large value is good. The only problem is that the rank is larger than required. We want to reduce the rank to as small as possible.

When arranging ranks, we can check existing ranks in the same row and the same column, and let the rank be the largest rank checked plus one.

For example, in the above matrix, when we fill in the rank of value 4 (corresponding order is 3):

Figure 2

The pseudo-code is as below. Let the required rank matrix be answer.

initial answer to all zero
for (i, j) in sorted_order:
    rank = -1
    for row in 0...m-1:
        rank = max(rank, answer[row][j] + 1)
    for col in 0...n-1:
        rank = max(rank, answer[i][col] + 1)
    answer[i][j] = rank

However, this approach still can not achieve the target rank matrix. There are two problems, and we will discuss that later.

Now, let's analyze the complexity first.

It's recommended to find out an entire working approach and then optimize it. However, for the convenience of writing, we do the optimization here.

Let MM be the number of rows and NN be the number of columns.

Since there is O(NM)\mathcal{O}(NM) points in the matrix, and for each point, we are required to search the row and column to determine its rank, the overall time complexity is O(NM(N+M))\mathcal{O}(NM\cdot(N+M)).

In the worst cases, where M=500M=500 and N=500N=500, NM(N+M)=500500(500+500)=2.5108NM\cdot(N+M) = 500 \cdot 500 \cdot (500 + 500) = 2.5 * 10^8.

Generally, to avoid Time Limit Exceeded, a complexity less than 10910^9 is needed. 10810^8 is dangerous. Can we simplify it?

Notice that we calculated the max of each row and each column many times. We can pre-calculate the maximum before iteration and update that max during the iteration.

We can use two arrays, rowMax and colMax, to record the maximum rank of each row and each column, respectively.

rowMax[i] means the max rank in i row, and colMax[j] means the max rank in j column.

Take the above example again. If we use these two arrays, we calculate ranks in this way:

Figure 3

The pseudo-code is as below.

initial answer to all zero
initial rowMax and colMax to all zero
for (i, j) in sorted_order:
    rank = max(rowMax[i], colMax[i]) + 1
    answer[i][j] = rank
    update rowMax and colMax

Good. Now we only need O(1)\mathcal{O}(1) for each point. The overall complexity is O(NM)\mathcal{O}(NM) for this part. Notice that sorting requires O(NMlog(NM))\mathcal{O}(NM\log(NM)), so the complexity so far is O(NMlog(NM))\mathcal{O}(NM\log(NM)).

Go back to our approach. In the above, we mention that there are two problems in the code.

The first one is that the minimal rank is not always the maximum of other ranks in the same row and columns plus one. It might be the same as the maximum.

For instance, consider this case:

Figure 4

We can see that, if the value is the same, we may not need to add one to the maximum rank.

Well... we can use some if-conditions to solve this problem.

The second problem is even worse: we may need to adjust the previous rank we set before.

Take the below case as an example. In this case, we have filled all the rank matrix except the right-down corner.

Figure 5

Since points with the same value in the same row or same column should share the same rank, we need to adjust the other 33.

So, how to solve this problem?

Let's dig into what parts should be adjusted.

Consider this case:

Figure 6

Note that the connected points should share the same rank, since they are connected by some "same row or same column" connections.

Also, there is one single 11 and one single 33 that do not connect to any other points, since they do not have such connection.

In conclusion, the points with the same value connected by the "same row or same column" connection should share the same rank. Let's call those points "connected part".

Connected part means a group of points with the same value, where any two points can be linked by a path consisting of horizontal lines ("the same row" connection) and vertical lines ("the same column" connection).

To avoid adjusting, we can find out the whole connected part's maximum rank first and then update that rank to each point in the part.

In this way, we can also avoid the first problem because the ranks in different connected parts are always different.

Now the question remaining is how to find the "connected parts"?

This question is similar to Most Stones Removed with Same Row or Column, where we need to find the numbers of connected parts.

In Most Stones Removed with Same Row or Column, there are three methods to locate the connected parts: BFS, DFS, and Union-Find. Here, we discuss BFS first.

In approach 1, we discuss BFS first and discuss the remaining two in approach 2 and approach 3, respectively.

The idea of BFS is simple: from a starting point, add all the directly connected points (i.e., with the same row or same column) into a waiting queue, pop points from the waiting queue, add new directly connected points into the queue, and repeat until the queue is empty.

add starting points to Queue q
while q is not empty:
    point p = q.pop()
    add p's adjacent points to q, skip visited points

This search costs O(V+E)\mathcal{O}(V + E), where VV is the number of points and EE is the number of edges in the graph, since we visit each point and each edge constant times.

We have O(NM)\mathcal{O}(NM) points, and if every two points are connected, the number of edges are O((NM)2)\mathcal{O}((NM)^2). Therefore, in worst case, O(V+E)=O((NM)2)=5004=6.251010\mathcal{O}(V + E) = \mathcal{O}((NM)^2) = 500^4 = 6.25 * 10^{10}. This will absolutely cause Time Limit Exceeded.

Can we simplify it?

Instead of connecting point to point, we can connect row to column, and column to row.

Consider viewing a point (i, j) as an edge linking i-th row and j-th column.

For example, see this case:

Figure 7

For a point (i, j), we connect i-th row and j-th column together. With this graph, we can easily find the connected parts.

For example, in the graph above, starting from (0, 0), searching the neighbors of Row 0, we can find Col 0, and Col 2. Therefore, (0, 0) and (0, 2) are connected.

Continue searching the neighbors of Col2, we can find Row 0 and Row 2. We have visited Row 0, but Row 2 is new. Hence, (0, 0), (0, 2), and (2, 2) are connected. Search the neighbors of Row 2, we find nothing new. For (0, 0), we can also search the neighbors of Col 0.

After this search, we get (0, 0)'s connected parts: (0, 0), (0, 2), and (2, 2).

Now we need to store the graph. We can have a map graphRow, where graphRow[i] represent the columns linked to i-th row, and a map graphCol, where graphCol[j] represent the columns linked to j-th col.

Wait a minute. Can we combine those two maps into a single map?

Note that the indexes of row start from 0 and occupy positive numbers. There are negative numbers that remain unused. We can store indexes of columns in negative numbers.

A natural idea is to use the negative of column index col-\text{col}. But both row and column indexes use zero, resulting in duplication of number zero. We need to shift one unit to avoid repetition: using col1-\text{col} - 1.

Luckily, we happen to have an operator called "complement" (\sim), where col=col1\sim\text{col} = -\text{col} - 1. What's more, simple math shows (col)=col\sim(\sim\text{col}) = \text{col}.

Therefore, we can use a single graph to store the connections between row and column: if i >= 0, graph[i] represents i-th row's neighbors (the complement of indexes of linked columns), and if i < 0, graph[~i] gives ~i-th column's adjacent points (the indexes of linked rows).

It's also OK to use two single maps to represent the connection relationship. People just use this trick to make implementation a bit simpler.

Now, only MM points (represent rows) and NN points (represent columns) are in the graph. Since we can not have edges between rows or between columns, the largest number of edges are O(NM)=2.5105\mathcal{O}(NM) = 2.5 * 10^5. The number is small enough to pass the test.

So, we successfully achieved finding connected parts by BFS. The remaining part is to fill our rank matrix answer by connected parts, in the sorted value order.

initial answer to all zero
initial rowMax and colMax to all zero
for connected_part in sorted_connected_parts:
    rank = -1
    for point (i, j) in connected_part:
        rank = (rank, max(rowMax[i], colMax[i]) + 1)
    for point (i, j) in connected_part:
        answer[i][j] = rank
        update rowMax and colMax

By far, we solve every problem we encounter and cleverly avoid Time Limit Exceeded by some optimization. The essence of this algorithm is to separate points into different connected parts, sort them by values, and finally fill in the rank matrix from the lowest value to the highest value.

For the detail of the algorithm, check the "Algorithm" part.

Algorithm

For convenience,

  • We refer "points" to indexes in the matrix, in the form (row_number, column_number).
  • We refer "value" to the values in the matrix. In other words, the value of point (i, j) is matrix[i][j].
  • We say two points are "connected" if and only if they have the same values and are in the same row or column, or they are all connected to the same point.
  • A "connected part" represents a group of connected points.

Step 1: Initialize graphs for different values. Iterate matrix and link the rows and columns in the corresponding graph.

Step 2: Initialize a value2index map to store connected parts.

  • This map will contain the value - index mapping. In the index part, separate points to put the connected points in the same array, and to put non-connected points in different arrays. (one array represents a connected part.)
  • Therefore, value2index should be in this form: {v1: [[point1, point2, ...], [point11, point12, ...], ...], v2: ...}, where point1, point2, ... are connected, and point11, point21, ... are also connected, but none of points from different array are connected.

Step 3: Fill in value2index map by iterating over matrix again.

  • For each point, use BFS to find out all the other connected points. Put all of them into value2index as an array.
  • Remember to mark those points visited to avoid duplicate additions.

Step 4: Sort the keys in value2index (i.e., all values in matrix).

Step 5: Initialize our answer matrix. Iterate value2index in the order of sorted keys to fill in answer.

  • For a given key (i.e., a value in matrix), we fill in answer by connected parts (i.e., one array).
  • Note that for points in the same connected part, they share the same rank.
  • For a connected part, Find out the minimum possible rank and update that rank.
  • To reduce the time for searching the minimum possible rank, we need two arrays to record the maximum rank of each row and each column, respectively.

Step 6: Return answer.

Challenge: Can you implement the code yourself without seeing our implementations?

Implementation

Complexity Analysis

Let MM be the number of rows in matrix and NN be the number of columns in matrix.

  • Time Complexity: O(NMlog(NM))\mathcal{O}(NM\log(NM)).

    • We need O(NM)\mathcal{O}(NM) to iterate matrix to build graphs.
    • We need O(NM)\mathcal{O}(NM) to iterate matrix to build value2index. We only visit points at most twice, since we skip points visited in BFS.
    • We need O(NMlog(NM))\mathcal{O}(NM\log(NM)) to sort the keys in value2index, since there are at most O(NM)\mathcal{O}(NM) different keys.
    • We need O(NM)\mathcal{O}(NM) to iterate value2index to build answer.
    • Adding together, the time we needed is O(NMlog(NM))\mathcal{O}(NM\log(NM)).
  • Space Complexity: O(NM)\mathcal{O}(NM).

    • For graphs, we store O(NM)\mathcal{O}(NM) edges (viewing each point as an edge).
    • For value2index, we store O(NM)\mathcal{O}(NM) points.
    • For rowMax and columnMax, they have size of O(M)\mathcal{O}(M) and O(N)\mathcal{O}(N), respectively.
    • In total, the size we needed is O(NM)\mathcal{O}(NM).

Approach 2: Sorting + DFS

Intuition

DFS is similar to BFS but differs in the order of searching. In most cases, when the search space is not huge, you can replace BFS with DFS.

In approach 1, we used BFS to find out the connected parts of each point. Now, we use DFS instead.

Algorithm

Step 1: Initialize graphs for different values. Iterate matrix and link the rows and columns in the corresponding graph.

Step 2: Initialize a value2index map to store connected parts.

  • This map will contain the value - index mapping. In the index part, separate points to put the connected points in the same array, and to put non-connected points in different arrays. (one array represents a connected part.)
  • Therefore, value2index should be in this form: {v1: [[point1, point2, ...], [point11, point12, ...], ...], v2: ...}, where point1, point2, ... are connected, and point11, point21, ... are also connected, but none of the points from different array are connected.

Step 3: Fill in value2index map by iterating over the matrix again.

  • For each point, use DFS to find out all the other connected points. Put all of them into value2index as an array.
  • Remember to mark those points visited to avoid duplicate additions.

Step 4: Sort the keys in value2index (i.e., all values in matrix).

Step 5: Initialize our answer matrix. Iterate value2index in the order of sorted keys to fill in answer.

  • For a given key (i.e., a value in matrix), we fill in answer by connected parts (i.e., one array).
  • Note that for points in the same connected part, they share the same rank.
  • For a connected part, Find out the minimum possible rank and update that rank.
  • To reduce the time for searching the minimum possible rank, we need two arrays to record the maximum rank of each row and each column, respectively.

Step 6: Return answer.

Challenge: Can you implement the code yourself without seeing our implementations?

Implementation

Complexity Analysis

Let MM be the number of rows in matrix and NN be the number of columns in matrix.

  • Time Complexity: O(NMlog(NM))\mathcal{O}(NM\log(NM)).

    • We need O(NM)\mathcal{O}(NM) to iterate matrix to build graphs.
    • We need O(NM)\mathcal{O}(NM) to iterate matrix to build value2index. We only visit points at most twice, since we skip points visited in DFS.
    • We need O(NMlog(NM))\mathcal{O}(NM\log(NM)) to sort the keys in value2index, since there are at most O(NM)\mathcal{O}(NM) different keys.
    • We need O(NM)\mathcal{O}(NM) to iterate value2index to build answer.
    • Adding together, the time we needed is O(NMlog(NM))\mathcal{O}(NM\log(NM)).
  • Space Complexity: O(NM)\mathcal{O}(NM).

    • For graphs, we store O(NM)\mathcal{O}(NM) edges (viewing each point as an edge).
    • For value2index, we store O(NM)\mathcal{O}(NM) points.
    • For rowMax and columnMax, they have size of O(M)\mathcal{O}(M) and O(N)\mathcal{O}(N), respectively.
    • In total, the size we needed is O(NM)\mathcal{O}(NM).

Approach 3: Sorting + Union-Find

Intuition

As we mentioned in approach 1, Union-Find (or UF, Disjoint Set) can be applied to find the connected parts.

Since Union-Find is not the essence of this problem (and considering the length of the article), we will not provide a very detailed explanation of Union-Find here. You can find some tutorials on other problems that require Union-Find, such as Redundant Connection or Most Stones Removed with Same Row or Column.

Now, we will have a quick review of Union-Find, and explain how we can use Union-Find to find the connected parts.

Similar to approach 1, we view the matrix points as edges that connect rows and columns.

As we know, we can view Union-Find as a forest-like structure (forest represents many trees). For example:

Figure 8

To store this structure, we can store the node's parents in a map or an array. The root's parent is itself. A map is used here for illustration.

This structure provides two methods, find and union.

  • For function find, it returns the root of the given node.

  • For function union, it accepts two nodes and merges the two trees that the nodes belong to.

For example, if we want to union node Row2 and node Col1, we will have something like this:

Figure 9

What union needs to do is to assure that find(Row2) and find(Col1) yield the same value. That's it.

There are two main optimizations we can do in Union-Find: path compression and union by rank.

  • Path compression means that when we apply find, we can link the nodes on our search path to the root directly, which will reduce the search time for the next time.

  • For union by rank, when we merge two trees, what we do is to link a tree's root to the other tree's leaf. But which tree's root should be linked? We can assign each tree a rank, and link the low-rank tree's root to the high-rank tree's leaf. The rank can be the size of the tree or the number of layers of the tree.

With path compression and union by rank, if we perform find and union NN times, it can be done in almost O(N)\mathcal{O}(N).

In fact, the time complexity is O(Nα(N))\mathcal{O}(N\alpha(N)), where function α(n)\alpha(n) is inverse Ackermann function, which is much smaller than log(n)\log(n) and approximately constant. The proof of the complexity is complicated, and interested readers can go to Wikipedia to check the detail.

Now, back to our problem. We need to find the connected parts.

We can use the union function to union rows and columns together and use find to determine which connected parts the given point belongs to.

Similar to approach 1, we use 0 and positive numbers to mark the row, and the complement numbers to mark the column.

Algorithm

Step 1: Implement find and union for Union-Find.

  • find receives an integer and returns the "root" of that integer.
  • union accepts two integers and merges them into the same union.

Step 2: Initialize Union-Finds (UFs) for different values. Iterate matrix and union the rows and columns in the corresponding Union-Find.

Step 3: Initialize a value2index map to store connected parts.

  • This map will contain the value - index mapping. In the index part, separate points to put the connected points in the same array, and to put non-connected points in different arrays. (one array represents a connected part.)
  • We mark those array by the "root" of points in Union-Find (so value2index is actually a nested map).
  • Therefore, value2index should be in this form: {v1: {root1: [point1, point2, ...], root2: [point11, point12, ...], ...}, v2: ...}, where point1, point2, ... are connected, and point11, point21, ... are also connected, but none of points from different set are connected.

Step 4: Fill in value2index map by iterate matrix again.

  • For a point, use find to calculate its "root". Put the point in the corresponding set.

Step 5: Sort the keys in value2index (i.e., all values in matrix).

Step 6: Initialize our answer matrix. Iterate value2index in the order of sorted keys to fill in answer.

  • For a given key (i.e., a value in matrix), we fill in answer by connected parts (i.e., one array).
  • Note that for points in the same connected part, they share the same rank.
  • For a connected part, Find out the minimum possible rank and update that rank.
  • To reduce the time for searching the minimum possible rank, we need two arrays to record the maximum rank of each row and each column, respectively.

Step 7: Return answer.

Challenge: Can you implement the code yourself without seeing our implementations?

Implementation

For convenience, we only implement path compression in the code above, and that's enough to pass the test.

Complexity Analysis

Let MM be the number of rows in matrix and NN be the number of columns in matrix.

  • Time Complexity: O(NMlog(NM))\mathcal{O}(NM\log(NM)).

    • We need O(NMlog(NM))\mathcal{O}(NM\log(NM)) to iterate matrix to build UFs. However, with both union by rank and path compression, we can achieve O(NMα(NM))\mathcal{O}(NM\alpha(NM)), where function α(n)\alpha(n) is inverse Ackermann function, which is much smaller than log(n)\log(n) and approximately constant.
    • We need O(NM)\mathcal{O}(NM) to iterate matrix to build value2index.
    • We need O(NMlog(NM))\mathcal{O}(NM\log(NM)) to sort the keys in value2index, since there are at most O(NM)\mathcal{O}(NM) different keys.
    • We need O(NM)\mathcal{O}(NM) to iterate value2index to build answer.
    • Adding together, the time we needed is O(NMlog(NM))\mathcal{O}(NM\log(NM)).
  • Space Complexity: O(NM)\mathcal{O}(NM).

    • For UFs, we store O(NM)\mathcal{O}(NM) edges (viewing each point as an edge).
    • For value2index, we store O(NM)\mathcal{O}(NM) points.
    • For rowMax and columnMax, they have size of O(M)\mathcal{O}(M) and O(N)\mathcal{O}(N), respectively.
    • In total, the size we needed is O(NM)\mathcal{O}(NM).

Report Article Issue

Comments: 3
granola's avatar
Read More

Great explanation. One of the best solution I ever saw on LC. Thanks for sharing!

2
Reply
Share
Report
shukria's avatar
Read More

Thank you for this. Every article should be like this - very clear and concise.

2
Reply
Share
Report
Tonymontaro's avatar
Read More

Excellent editorial 👏🏼

2
Reply
Share
Report

You don't have any submissions yet.

/23
Contribute
|||
Saved
Trie
This list is empty.